AITopics | base vector

Collaborating Authors

base vector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces

Gabdullin, Nikita

arXiv.org Artificial IntelligenceDec-11-2025

The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems, specifically An root system vectors, can be used as targets for latent space configurations (LSC) to ensure the desired LS structure. One of the main LSC advantage is the possibility of training classifier NNs without classification layers, which facilitates training NNs on datasets with extremely large numbers of classes. This paper provides a more general overview of possible vector systems for NN training along with their properties and methods for vector system construction. These systems are used to configure LS of encoders and visual transformers to significantly speed up ImageNet-1K and 50k-600k classes LSC training. It is also shown that using the minimum number of LS dimensions for a specific number of classes results in faster convergence. The latter has potential advantages for reducing the size of vector databases used to store NN embeddings.

artificial intelligence, configuration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.07509

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Large-scale L-BFGS using MapReduce

Weizhu Chen, Zhenghao Wang, Jingren Zhou

Neural Information Processing SystemsOct-2-2025, 22:32:17 GMT

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s. With an increasing demand to deal with massive instances and variables, it is important to scale up and parallelize L-BFGS effectively in a distributed system. In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. First, we show that a naive implementation of L-BFGS using Map-Reduce requires either a significant amount of memory or a large number of map-reduce steps with negative performance impact. Second, we propose a new L-BFGS algorithm, called V ector-free L-BFGS, which avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. The algorithm scales very well and enables a variety of machine learning algorithms to handle a massive number of variables over large datasets. We prove the mathematical equivalence of the new V ector-free L-BFGS and demonstrate its excellent performance and scalability using real-world machine learning problems with billions of variables in production clusters.

algorithm, l-bfg, map-reduce step, (16 more...)

Neural Information Processing Systems

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

ECGTwin: Personalized ECG Generation Using Controllable Diffusion Model

Lai, Yongfan, Liu, Bo, Guan, Xinyan, Zhao, Qinghao, Li, Hongyan, Hong, Shenda

arXiv.org Artificial IntelligenceAug-6-2025

Personalized electrocardiogram (ECG) generation is to simulate a patient's ECG digital twins tailored to specific conditions. It has the potential to transform traditional healthcare into a more accurate individualized paradigm, while preserving the key benefits of conventional population-level ECG synthesis. However, this promising task presents two fundamental challenges: extracting individual features without ground truth and injecting various types of conditions without confusing generative model. In this paper, we present ECGTwin, a two-stage framework designed to address these challenges. In the first stage, an Individual Base Extractor trained via contrastive learning robustly captures personal features from a reference ECG. In the second stage, the extracted individual features, along with a target cardiac condition, are integrated into the diffusion-based generation process through our novel AdaX Condition Injector, which injects these signals via two dedicated and specialized pathways. Both qualitative and quantitative experiments have demonstrated that our model can not only generate ECG signals of high fidelity and diversity by offering a fine-grained generation controllability, but also preserving individual-specific features. Furthermore, ECGTwin shows the potential to enhance ECG auto-diagnosis in downstream application, confirming the possibility of precise personalized healthcare solutions.

machine learning, natural language, personalized ecg generation, (14 more...)

arXiv.org Artificial Intelligence

2508.0272

Country:

North America (0.68)
Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models

Zheng, Guangtao, Ye, Wenqian, Zhang, Aidong

arXiv.org Artificial IntelligenceJun-19-2025

Deep learning models often achieve high performance by inadvertently learning spurious correlations between targets and non-essential features. For example, an image classifier may identify an object via its background that spuriously correlates with it. This prediction behavior, known as spurious bias, severely degrades model performance on data that lacks the learned spurious correlations. Existing methods on spurious bias mitigation typically require a variety of data groups with spurious correlation annotations called group labels. However, group labels require costly human annotations and often fail to capture subtle spurious biases such as relying on specific pixels for predictions. In this paper, we propose a novel post hoc spurious bias mitigation framework without requiring group labels. Our framework, termed ShortcutProbe, identifies prediction shortcuts that reflect potential non-robustness in predictions in a given model's latent space. The model is then retrained to be invariant to the identified prediction shortcuts for improved robustness. We theoretically analyze the effectiveness of the framework and empirically demonstrate that it is an efficient and practical tool for improving a model's robustness to spurious bias on diverse datasets.

artificial intelligence, machine learning, prediction shortcut, (14 more...)

arXiv.org Artificial Intelligence

2505.1391

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Task Vector Quantization for Memory-Efficient Model Merging

Kim, Youngeun, Lee, Seunghwan, Jung, Aecheon, Ryu, Bogon, Hong, Sungeun

arXiv.org Artificial IntelligenceMar-10-2025

Model merging enables efficient multi-task models by combining task-specific fine-tuned checkpoints. However, storing multiple task-specific checkpoints requires significant memory, limiting scalability and restricting model merging to larger models and diverse tasks. In this paper, we propose quantizing task vectors (i.e., the difference between pre-trained and fine-tuned checkpoints) instead of quantizing fine-tuned checkpoints. We observe that task vectors exhibit a narrow weight range, enabling low precision quantization (up to 4 bit) within existing task vector merging frameworks. To further mitigate quantization errors within ultra-low bit precision (e.g., 2 bit), we introduce Residual Task Vector Quantization, which decomposes the task vector into a base vector and offset component. We allocate bits based on quantization sensitivity, ensuring precision while minimizing error within a memory budget. Experiments on image classification and dense prediction show our method maintains or improves model merging performance while using only 8% of the memory required for full-precision checkpoints.

baseline, evaluation, fp32, (13 more...)

arXiv.org Artificial Intelligence

2503.06921

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models

Jing, Yi, Yao, Zijun, Ran, Lingxu, Guo, Hongzhu, Wang, Xiaozhi, Hou, Lei, Li, Juanzi

arXiv.org Artificial IntelligenceFeb-27-2025

Large language models (LLMs) excel in tasks that require complex linguistic abilities, such as reference disambiguation and metaphor recognition/generation. Although LLMs possess impressive capabilities, their internal mechanisms for processing and representing linguistic knowledge remain largely opaque. Previous work on linguistic mechanisms has been limited by coarse granularity, insufficient causal analysis, and a narrow focus. In this study, we present a systematic and comprehensive causal investigation using sparse auto-encoders (SAEs). We extract a wide range of linguistic features from six dimensions: phonetics, phonology, morphology, syntax, semantics, and pragmatics. We extract, evaluate, and intervene on these features by constructing minimal contrast datasets and counterfactual sentence datasets. We introduce two indices-Feature Representation Confidence (FRC) and Feature Intervention Confidence (FIC)-to measure the ability of linguistic features to capture and control linguistic phenomena. Our results reveal inherent representations of linguistic knowledge in LLMs and demonstrate the potential for controlling model outputs. This work provides strong evidence that LLMs possess genuine linguistic knowledge and lays the foundation for more interpretable and controllable language modeling in future research.

base vector, computational linguistic, linguistic feature, (14 more...)

arXiv.org Artificial Intelligence

2502.20344

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Large-scale L-BFGS using MapReduce

Weizhu Chen, Zhenghao Wang, Jingren Zhou

Neural Information Processing SystemsFeb-9-2025, 22:59:29 GMT

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s. With an increasing demand to deal with massive instances and variables, it is important to scale up and parallelize L-BFGS effectively in a distributed system. In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. First, we show that a naive implementation of L-BFGS using Map-Reduce requires either a significant amount of memory or a large number of map-reduce steps with negative performance impact. Second, we propose a new L-BFGS algorithm, called Vector-free L-BFGS, which avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. The algorithm scales very well and enables a variety of machine learning algorithms to handle a massive number of variables over large datasets. We prove the mathematical equivalence of the new Vector-free L-BFGS and demonstrate its excellent performance and scalability using real-world machine learning problems with billions of variables in production clusters.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Maintaining Adversarial Robustness in Continuous Learning

Ru, Xiaolei, Cao, Xiaowei, Liu, Zijia, Moore, Jack Murdoch, Zhang, Xin-Ya, Zhu, Xia, Wei, Wenjia, Yan, Gang

arXiv.org Artificial IntelligenceAug-13-2024

Continual learning and adversarial robustness are distinct and important research directions in artificial intelligence, each of which has witnessed significant advances. The former addresses a critical challenge known as catastrophic forgetting, where a neural network trained on a sequential of new tasks typically exhibits a dramatic drop in its performance on previously learned tasks if the model cannot revisit the previous data [1]. The latter focuses on developing defenses against adversarial attacks that can deceive models into confidently misclassifying objects by adding subtle targeted perturbations to the input images often imperceptible to human observers [2]. However, the evolution of the neural network's adversarial robustness in context of continuous learning remains underexplored. In our experiments, we observe that adversarial robustness enhanced by well-designed defense algorithms on previous data is easily lost when the neural network updates its weights to accommodate new tasks, resulting in a phenomenon similar to catastrophic forgetting. This presents an intriguing challenge: how can we maintain the adversarial robustness during continuous learning? In other words, the objective of continuous learning expands to concurrently encompass (classification) performance and adversarial robustness. In this paper, we present a solution by proposing a novel gradient projection technique called Double Gradient Projection (DGP), which inherently enables collaboration with a class of defense algorithms that enhance robustness through sample gradient smoothing. DGP is grounded on a theoretical hypothesis that a neural network's robustness can be maintained if the smoothness of sample gradients from previous data remain unchanged after weight updates.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Artificial Intelligence

2402.11196

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Large-scale L-BFGS using MapReduce

Neural Information Processing SystemsMar-13-2024, 13:37:00 GMT

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s. With an increasing demand to deal with massive instances and variables, it is important to scale up and parallelize L-BFGS effectively in a distributed system. In this paper, we study the problem of parallelizing the L-BFGS algorithm in large clusters of tens of thousands of shared-nothing commodity machines. First, we show that a naive implementation of L-BFGS using Map-Reduce requires either a significant amount of memory or a large number of map-reduce steps with negative performance impact. Second, we propose a new L-BFGS algorithm, called Vector-free L-BFGS, which avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. The algorithm scales very well and enables a variety of machine learning algorithms to handle a massive number of variables over large datasets. We prove the mathematical equivalence of the new Vector-free L-BFGS and demonstrate its excellent performance and scalability using real-world machine learning problems with billions of variables in production clusters.

algorithm, l-bfg, map-reduce step, (16 more...)

Neural Information Processing Systems

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Human Understandable Explanation Extraction for Black-box Classification Models Based on Matrix Factorization

Kim, Jaedeok, Seo, Jingoo

arXiv.org Machine LearningSep-18-2017

In recent years, a number of artificial intelligent services have been developed such as defect detection system or diagnosis system for customer services. Unfortunately, the core in these services is a black-box in which human cannot understand the underlying decision making logic, even though the inspection of the logic is crucial before launching a commercial service. Our goal in this paper is to propose an analytic method of a model explanation that is applicable to general classification models. To this end, we introduce the concept of a contribution matrix and an explanation embedding in a constraint space by using a matrix factorization. We extract a rule-like model explanation from the contribution matrix with the help of the nonnegative matrix factorization. To validate our method, the experiment results provide with open datasets as well as an industry dataset of a LTE network diagnosis and the results show our method extracts reasonable explanations.

artificial intelligence, explanation, machine learning, (18 more...)

arXiv.org Machine Learning

1709.06201

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Air (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.65)

Add feedback